Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran

Background The high number of COVID-19 deaths is a serious threat to the world. Demographic and clinical biomarkers are significantly associated with the mortality risk of this disease. This study aimed to implement Generalized Neural Additive Model (GNAM) as an interpretable machine learning method to predict the COVID-19 mortality of patients. Methods This cohort study included 2181 COVID-19 patients admitted from February 2020 to July 2021 in Sina and Besat hospitals in Hamadan, west of Iran. A total of 22 baseline features including patients' demographic information and clinical biomarkers were collected. Four strategies including removing missing values, mean, K-Nearest Neighbor (KNN), and Multivariate Imputation by Chained Equations (MICE) imputation methods were used to deal with missing data. Firstly, the important features for predicting binary outcome (1: death, 0: recovery) were selected using the Random Forest (RF) method. Also, synthetic minority over-sampling technique (SMOTE) method was used for handling imbalanced data. Next, considering the selected features, the predictive performance of GNAM for predicting mortality outcome was compared with logistic regression, RF, generalized additive model (GAMs), gradient boosting decision tree (GBDT), and deep neural networks (DNNs) classification models. Each model trained on fifty different subsets of a train-test dataset to ensure a model performance. The average accuracy, F1-score and area under the curve (AUC) evaluation indices were used for comparison of the predictive performance of the models. Results Out of the 2181 COVID-19 patients, 624 died during hospitalization and 1557 recovered. The missing rate was 3 percent for each patient. The mean age of dead patients (71.17 ± 14.44 years) was statistically significant higher than recovered patients (58.25 ± 16.52 years). Based on RF, 10 features with the highest relative importance were selected as the best influential features; including blood urea nitrogen (BUN), lymphocytes (Lym), age, blood sugar (BS), serum glutamic-oxaloacetic transaminase (SGOT), monocytes (Mono), blood creatinine (CR), neutrophils (NUT), alkaline phosphatase (ALP) and hematocrit (HCT). The results of predictive performance comparisons showed GNAM with the mean accuracy, F1-score, and mean AUC in the test dataset of 0.847, 0.691, and 0.774, respectively, had the best performance. The smooth function graphs learned from the GNAM were descending for the Lym and ascending for the other important features. Conclusions Interpretable GNAM can perform well in predicting the mortality of COVID-19 patients. Therefore, the use of such a reliable model can help physicians to prioritize some important demographic and clinical biomarkers by identifying the effective features and the type of predictive trend in disease progression. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01827-y.


Background
Since late 2019, the spread of SARS-CoV-2 pneumonia, known as COVID-19, began in Wuhan, China, and has become a worldwide pandemic disease [1]. In the treatment of COVID-19 patients, the assessment of demographic, laboratory biomarkers, and clinical risk factors as well as the identification of death predictors in these patients have always been considered as one of the challenges facing researchers [2]. Several studies have been performed to evaluate changes in levels and relationships between laboratory biomarkers such as aspartate aminotransferase (AST), alanine aminotransferase (ALT), lymphocytes (LYM), neutrophil (NEU), and lactate dehydrogenase (LDH) in patients with COVID-19 [3,4].
Recently, advanced models of medical information analysis have been extended that can help interpret complex biological relationships between clinical measurements and patient outcomes. Machine learning is a very powerful tool for identifying patterns, classifying clinical decision making, also identifying features of medical data that are relevant to clinical outcomes [5].
Prediction and classification models are designed to help healthcare professionals with some decisions such as using different diagnostic, starting or stopping treatments, using the available resources in a good way, and also can avoid some common biases in clinical decision making [6]. To estimate the probability that a specific outcome i.e. death will occur, risk prediction models are employed. These models used patient characteristics and the accuracy of the prediction depends on the ability of the model in discovering the complex relationship between patient characteristics and outcome [7,8]. Recently, various machine learning methods have been used to predict and classify caused by COVID-19 mortality, including the use of logistic regression, RF, and GBDT [9][10][11].
To improve the performance of any analysis such as identifying the most important features and classification analysis many preprocessing techniques can be applied. One of the most important stages of preprocessing is dealing with missing values in features. Some methods require complete data without missing values. Conducting the analysis without considering missing values will bias the results and make some analyzes impossible [12]. There are different strategies in dealing with missing values such as deleting missing cases, but may leads to bias, imputing missing values using statistical imputation methods i.e., univariate methods; the mode, mean, or zero, but the results are not optimal, imputing missing values with KNN, and MICE imputation methods [13][14][15][16].
In most cases, the relationship between features and the clinical outcomes is non-linear. For this situation, the classic models e.g. the linear regression models are inappropriate. There are methods such as GAMs that enable to capture of non-linear patterns [17]. GAMs are superior in several respects, and the purpose of using these models is to maximize the accuracy of response prediction and to discover the nonlinear relationships of predictor features while maintaining explain ability [18].
Machine learning algorithms seem to be suitable as a nonlinear method for data modeling as well as for predicting and classifying responses, because of automatic discovering the relationships between the data and being able to generate a suitable output with minimum error [19]. Among the machine learning methods that have been used for prediction, we can mention neural network-based methods such as DNNs, and GNAMs [20,21].
Despite the remarkable results of DNNs in predicting the effects of clinical biomarkers on virus infections [22] and biomedical studies [23], since these models are considered as black-box models, it is inexplicit to understand how they perform their predictions and how can be interpreted. Therefore lack of interpretability is an inevitable problem in applying these methods in fields such as healthcare. Interpretive machine learning method inspired by generalized additive models is an emerging research topic that seeks to solve this problem.
One of the suitable methods for debugging neural network predictions is the use of GNAMs which are inherently interpretable. Advantages of GNAMs include showing a larger class of classic GAMs, interpretable of a neural network model, and showing learned diagrams as accurate descriptions of predictions [20].
This study aimed to predict COVID-19 patient outcomes (dead/recovered) admitted to hospitals in Hamadan, Iran, GAMs will be used as a classical method and DNNs and GNAMs will be used as machine learning methods. The fitted models will be compared with Accuracy and AUC classification indices. The behavior of each feature in mortality risk will be visualized and interpreted based on GNAMs.

Methods
The method section is assigned into several subsections. First, data details used in this study was introduced in the subsection COVID-19 dataset. Then, in the subsection Data Imputation, imputation algorithms were introduced. In the subsection Data Description, the method of reporting the results is presented. In the subsection Feature Selection using random forest, the method of feature selection using the random forest algorithm was described. Finally, the classification models used in this study were fully explained in the Logistic regression model, Gradient Boosting Decision Tree (GBDT), Generalized Additive Models, Deep Neural Networks (DNN), and Generalized Neural Additive Models sub-sections. At last, in the Evaluation metrics and Class imbalanced issue subsection performance of the models was described.

COVID-19 dataset
In this cohort study, the dataset of 2181 Covid-19 patients who were admitted to Sina (COVID-19 treatment center) and Besat hospitals affiliated to Hamadan University of Medical Sciences, Iran were used. In this study, patients with positive real time reverse transcriptase polymerase chain reaction (RT-PCR) on samples from upper respiratory nasopharyngeal swabs were enrolled to the study. The study was approved by the Ethical Committee of the Hamadan University of Medical Science with the approved ethical code: IR.UMSHA.REC.1400.366. The dataset was collected from patient information from February 2020 to July 2021, which includes baseline demographic and clinical biomarkers. Demographic characteristics i.e. age, sex, smoking, compromised immune system (Com.immune. sys), renal insufficiency, diabetes, and hypertension as well as clinical biomarkers i.e. erythrocyte sedimentation rate (ESR), blood urea nitrogen (BUN), blood sugar (BS), blood creatinine (CR), prothrombin (PT), serum glutamic-pyruvic transaminase (SGPT), serum glutamicoxaloacetic transaminase (SGOT), alkaline phosphatase (Alp), thromboplastin or partial thromboplastin time (PTT), platelets (Plat), hematocrit (HCT), hemoglobin (Hb), lymphocytes (Lym), monocytes (Mono), and neutrophils (NUT) were collected from patient information. A total of 22 features (or input features) were retrieved, consisting of 15 clinical biomarkers and 7 demographic characteristics of patients. For all classification models the patient's recovery status considered as a binary outcome (death = 1 and recovery = 0).

Data Imputation
The missing rate in this study was 3 percent based on all features for each patient. A detailed description of the missing rate for each feature was reported in Table 1. In this study, we followed four different strategies to deal with missing data. First, discarding entire rows (cases) containing missing values and subsequent analysis was done. In this strategy, the information of 2117 patients was analyzed (Complete case dataset). In the three other strategies missing values were imputed by the mean, the KNN imputation, the MICE method, and subsequent analysis was done (Imputed dataset). In the mean imputation method, the missing values in features are imputed by the average of all the observations in that feature that are not missing. In the KNN imputation method, the missing values are imputed by an average of the corresponding values of the k nearest features which are computed by similarity measures such as Euclidean distance. In this study, k varies from 5 to 100 and the best k value for imputation was 10. The MICE imputation method based on fully conditional specification, first, calculates the mean of each feature that has a missing value and uses the mean as replacement values. Then (linear/logistic) regression models with chain equations are fitted using features with missing values and target feature. Finally, the missing values are predicted and updated with 100 iterations.
After applying different methods of dealing with missing data; the complete case dataset (for the first strategy) and imputed datasets separately were prepared for use in the following steps ( Fig. 1).

Data Description
The quantitative features described as mean ± standard deviation and qualitative features as frequency and percentage. Two independent sample t-test were used to compare the mean of the quantitative features between two groups. To investigate the relationship between the qualitative features in pairs, the Pearson Chi-square test was employed. The significance level was set at 0.05 in all analyses.

Feature selection using random forest
Due to the large number of input features, the RF algorithm was used as one of the most common approaches to identify important features that had acceptable results. RF is one of the supervised learning algorithms for classification and regression. The RF is an ensemble of several decision trees that grow using recursive partitioning of bootstrap samples. RF uses several indices to calculate the importance of features in predicting outcome, and one of them is the Gini index and is the value between zero to one [24]. In this study, the RF algorithm with 600 decision trees and the Gini index employed to calculate the importance of each feature, and the features with a relative importance value higher than 4% chose for further analysis. Also, the RF was considered as a classification model for comparing with the other models.

Logistic regression model
Logistic regression is a traditional statistical model used in the classification task. For a binary outcome, the logistic regression model is shown below: where for a given input feature x i = x 1i , … , x iq , i = 1, … , n; n is the number of training samples, q is the number of the input features, and p is the probability of belonging to class 1, the logarithm of the odds of this class ( log p 1−p ) is called the logit function which is a linear function of the input features. Also, β j are regression coefficients that are estimated based on the dataset from the maximum likelihood method [25].

Gradient Boosting Decision Tree (GBDT)
GBDT is a reinforcement algorithm in machine learning where several weak classifiers (individual decision trees) are constructed to form a strong classifier. By combining the results of each weak classifier, the end prediction results are obtained. For a binary outcome, GBDT used the decision tree as the weak classifier and makes global convergence of the algorithm by following the negative gradient [26]. Let x i = x 1i , . . . , x ip , i = 1, . . . , n , n and p are the number of training samples and the number of the input features, respectively, and y i ∈ {0, 1} n i=1 denoted input feature or binary target. The steps of GBDT are as follows: Step I: the model β is the initial constant value, for the regression model ( Y = βX): Step II: calculate the residuals; let F m : R p → R be a predictive model at iteration m,m = 1, . . . , M , and let L y i , F m (x i ) be a differentiable loss function. According to the least square approach, the parameter e m of the model is obtained and the model h(x i ; e m ) is fitted.
Step III: minimization of loss function; where β m is obtained by fitting a regression tree to the gradients of each sample concerning the current estimator at stage m; Step IV: update of the model and reduce overfitting ( h(x i ; e) so called learning rate); These steps are repeated until m trees are grown [27]. Be noted that, if the target feature was binary, the logistic regression was selected for the growing tree. Therefore, the model β is the initial constant value, for the logistic regression model;

Generalized additive models
GAMs are a semi-parametric extension of the generalized linear models, used for the case when there is no a priori reason for choosing a particular response function (such as linear, quadratic, etc.). GAMs are interpretable by design because of their functional forms. Given an input x j ∈ R D j = 1, . . . , p , where p is the number of features a binary response y i i = 1, . . . , n , where n is sample belongs to an exponential or non-exponential family distribution, a link function g (e.g. g is log π 1−π in binary classification, π is the probability of death), main effects f j for the j th feature called smooth functions with E f j = 0 . For a univariate response variable of multiple features, GAMs is expressed as follows [18]: where µ = E y i , β 0 is intercept parameter, and ε ∼ N 0, σ 2 is random variables. In GAMs the linear or nonlinear relationships between response and features (6) follow smooth patterns and are explained by unspecified smooth functions known as splines or basis function. The smoothness of each function determined by the smoothing regularization parameter known . In this study, the GAM was fitted by 30 to 300 basis functions, and the varies from 0.1 to 0.9.

Deep Neural Networks (DNN)
The structure of a shallow neural network consists of three layers, i.e., the input, the hidden, and the output was considered. Each layer has a weight that indicates the effect of the features on each other. The goal of neural network is to reduce the error or cost function in classification or regression and bring the network closer to the desired result. To achieve this goal the connection weights is update during training by various algorithms i.e. backpropagation, and amount of changes in weights control by a hyperparameter called learning rate. Another parameter known as batch size should be obtained which is the number of training examples in one forward or backward pass. In DNNs, the hidden layers are more than one, which may increase the classification and prediction accuracy of the network [28]. Schematic representation of the DNN architecture used in the current study with 3 hidden layers containing 100 nodes in each layer is given in Fig. 2. It should be noted that rectified linear unit (ReLU) activation function used for all hidden layers.
The mathematical form of this structure is given following; Fig. 2 The structure of the DNN where W matrix and b vectors are weight and bias in each layer.
Initial values of W matrix was chosen as random values from a normal distribution with mean and variance equal to zero and 0.2, respectively, and the initial values of b vector was chosen as 1.

Generalized Neural Additive Models
GNAMs belong to the GAMs family and learn a linear combination of multi-layer perceptron (MLP) with an input, an output, and several hidden layers. The output of each MLP is where ω is the network parameter, j = 1, . . . , p , where p is the number of features, and initial values of these parameters could be chosen as random values from a normal distribution with mean and variance equal to zero and 0.2, respectively. It should be noted that the weight distribution was selected based on the pre-trained source network proposed by Agarwal et al., [20] which was designed for a similar binary classification task.
The Exp-Centered (ExU) activation function in the hidden layer for each neuron computes given by where τ = 1, . . . , k , k is the number of neurons in the hidden layer of the j th MLP. The sum of all MLP outputs, in addition to the intercept of the model is g(µ) which is (9) f j x j = ω 1j ExU x j,1 + · · · + ω τ j ExU x j,τ , shown earlier in formula (1). In the last step to classify binary outcome, sigmoid activation function, h(.), was employed [20]. The architecture of the GNAMs with a hidden layer is presented in Fig. 3.
One of the characteristics of the GNAMs is interpreting by visualizing its corresponding smooth function from, f j (x j ) , versus x j . We take this advantage of the GNAMs and plot each f j (x j ) , versus x j for features extracted in feature selection step.
Based on the loss function below, the training of GNAMs is done, is the training set of size n for input (x) and target (y) features, l(x, y; θ) is the loss function, η(x; θ) = 1 K x k f θ k (x k ) 2 is the output penalty for K features (L2 norm), γ (θ ) is the weight decay for K features (L2 norm), f θ k is the feature network for the k th feature. The development of the network is also regularized based on feature dropout (drop collinearity features during training) and dropout related to smoothness (regularization of ExUs in each features until the smooth functions are learned while being able to represent jumps) with coefficients λ 3 and λ 4 respectively. Then, the cross-entropy loss function for binary target given as, (11) L(θ ) = E x,yD l x, y; θ + 1 η(x; θ ) + 2 γ (θ ), (12) l x, y; θ = −ylog(p θ (x)) − 1 − y log(1 − p θ (x)), Where p θ (x) = sigmoid β θ 0 + K k=1 f θ k (x k ) is predicted probability from output GNAMs [20].
Tuning of hyperparameters of Adam optimizer for training networks such as learning rate and batch size were tested based on the values between 0.001 to 0.2 and 100 to 500, respectively. With the aim of avoiding overfitting, the value of regularization parameters (λ) for DNN and GNAM was tuned and the optimized values were set as follows: the λ 1 (output penalty coefficient) in the discrete set {0.001, 0.01, 0.1}, λ 2 (weight decay coefficient) in the discrete set {0, 0.00001, 0.0001, 0.001, 0.01}, λ 3 (dropout coefficient) in the discrete set {0, 0.1, 0.3, 0.5, 0.7, 0.9} and λ 4 (feature dropout coefficient) in the discrete set {0, 0.05, 0.1}. The tuning of the λ 2 and λ 3 parameters for DNNs was the same as the tuning of the GNAMs model and the number of epochs for both models was 200. The desired range for mentioned parameters such as the number of hidden layer neurons, learning rate, number of the epoch, etc., was also valued based on the range proposed in Agarwal's study [20]. However, the specific optimal value of each hyperparameter was determined according to the data of the present study by the method of cross-validation.

Evaluation metrics and Class imbalanced issue
Firstly, regarding the imbalance ratio of 4:10 (minority/majority) for the imbalanced binary classification problem, data balancing was done using the synthetic minority over-sampling technique (SMOTE) method. For subsequent analysis, the dataset is randomly divided into the two subsets of train and test with a ratio of 7:3, respectively. The process of splitting the dataset into the train and test sets was repeated fifty times. The desired prediction models were fitted based on each data set and the evaluation indices for the respective train and test sets were calculated separately. The final performance of the models is calculated as the average of these iterations.
Then the logistic regression, RF, GBDT, GAM, DNN, and GNAM are trained based on the selected features. To compare the predictive performance of models the accuracy, F1-score and the area under the receiver operating characteristics curve (AUC) indices were employed. The steps are shown graphically in Fig. 1. For all evaluation metrics the closer the value to one showed the higher the diagnostic power of the test or the predictive accuracy of the model. The analysis of these methods was done by python 3.8 with sklearn and torch modules using Xeon ® 4210 Core i32 CPU with 128 GB ram memory and the source code for NAM models available at https:// neuraladdit ive-models. github. io. Although, depending on the conditions and available data, some changes have been made in the original codes.

Results
In this study, out of 2181 (53.6% male) COVID-19 patients, 1557 were recovered and 624 were dead. The mean age of recovered patients (58.25 ± 16.52 years) was significantly lower than dead patients (71.17 ± 14.44 years) (p < 0.001). Thoroughly, the frequency, percentage, mean, and standard deviation of the mentioned clinical biomarkers with their statistical significance between the two groups of dead and recovered patients are shown in Table 1.
After the implementation of four imputation strategies, the relative importance score of each feature was calculated by Gini index in RF classifier and the results were reported in Fig. 4. These figures confirmed that these different strategies have led to the selection of the same important features; BUN and Com.immune.sys selected as the most and the least important features, respectively. Ten first important features with the relative importance more than 4% were used as the final input features to fit the logistic, GBDT, RF, GAM, DNN, and GNAM models.
The optimal parameters values of different models were adjusted according to cross-validation in different imputation scenarios as follows: Based on the first strategy of handling missing value (remove cases with missing values); The best results for GAM were obtained by 250 basis functions and = 0.8 . For the GBDT, the number of trees and learning rate regularization parameters were set to 200 and 0.05, respectively. For the RF, the number of trees and the maximum depth of the tree were set to 300 and 10, respectively.
After various checks with different values of the regularization parameters, the DNN optimized by the λ 2 and λ 3 regularization parameters of 0.001 and 0.3, respectively, and the batch size 150, learning rate were set to 0.005, the number of hidden layers in this model containing 3 layers with 86, 64, and 16 neurons in each hidden layer, respectively.
The GNAM was optimized by the batch size of 150, the learning rate were set to 0.005, and the number of hidden layers in this model containing 3 layers with 86, 64, and 16 neurons, respectively. Also, the GNAM optimized by the λ 1 , λ 2 , λ 3 , and λ 4 regularization parameters were set to 0.01, 0.001, 0.5, and 0.05, respectively.
Based on three imputation strategies similar results were obtained; The best results based on GAM were obtained by 230 basis functions and = 0.75 . For the GBDT, the number of trees and learning rate regularization parameters were set to 200 and 0.04, respectively. For the RF, the number of trees and the maximum depth of the tree were set to 300 and 10, respectively.
After various checks with different values of the regularization parameters, the DNN optimized by the λ 2 and λ 3 regularization parameters were set to 0.001 and 0.25, respectively, and the batch size 300, learning rate were set to 0.002, the number of hidden layers in this model containing 3 layers with 86, 64, and 16 neurons in each hidden layer, respectively. The GNAM was optimized by the batch size 300, the learning rate were set to 0.002, and the number of hidden layers in this model containing 3 layers with 86, 64, and 16 neurons, respectively. Also, the GNAM optimized by the λ 1 , λ 2 , λ 3 , and λ 4 regularization parameters were set to 0.01, 0.001, 0.4, and 0.03, respectively.
The accuracy, F1-score and AUC of the trained models based on the four imputation strategies are reported in Table 2. Results showed that, the GNAM model had the best performance with the test accuracy, F1-score and AUC value 0.847, 0.691, and 0.774 for the removed missing values strategy (a), and AUC value of 0.855, 0.704,

Remove cases with missing values
Mean imputation KNN imputation MICE imputation Fig. 4 The relative importance of all features selected by random forest classifier based on four imputation strategies (complete case dataset and imputed datasets) and 0.782 for KNN imputation dataset, respectively. The GAM model in all datasets had the worst performance with the test accuracy, F1-score and AUC value. Feature smooth functions learned by ensemble of fifty GNAMs along with the density of the COVID-19 dataset be shown in Fig. 5. The deepteal color indicates the data density for each feature. The darker the bar the more data is present in that area and the trend of the learned feature smooth functions is shown by the red lines. For example, the smooth feature function of age showed that the risk of death in COVID-19 patients increased as age increased and for ages, almost over 60 years, the risk of death slope has the most rapid rise.
The smooth feature function of CR showed three different behavior. The risk of death in COVID-19 patients is constant from the CR values zero to 30, it increased from the values 30 to 80, slowly, and for CR almost over 80, the risk of death slope has the most rapid rise.
The smooth feature function of Lym showed, the risk of death in COVID-19 patients is higher in lower values of Lym counts. So by increasing the Lym counts, the probability of death is decreased. It is also showed, the risk of death is high and constant from the values zero to 50, and the risk of death slope has the most rapid fall.

Discussion
In this study, some effective laboratory findings and demographic features were evaluated to predict the probability of death due to COVID-19 disease. Since the pattern of some features affecting COVID-19 is generally nonlinear, the aim is to determine an appropriate model with the highest prediction accuracy.  The results showed that the risk of death from COVID-19 increases with age. This finding was also confirmed in studies performed on SARS [29], Middle East respiratory syndrome (MERS) [30], and COVID-19 [31]. Because various medical conditions such as hypertension, surgery, hyperlipidemia, and hyperglycemia occur due to old age, this group of patients is susceptible to COVID-19 [32].
Further findings indicated that sex, smoking, compromised immune system, renal insufficiency, PTT, and Plat had no significant effect on death. In Abohamr et al., study, sex, smoking, renal insufficiency, and Plat had a significant effect on death. In the study of Liu et al., smoking, Plat, and PTT factors had no significant effect on death [33]. In the present study, from overall patients, only five patients with a compromised immune system were reported and only one patient experienced death. However, in the study of Kostoff et al., and Yazdanpanah et al., they examined the importance of the immune system feature, and the results of their studies showed that increasing biomarkers of the immune system leads to inflammation and ultimately more damage to other organs and even death [34,35].
Also, the results of the evaluated factors such as dia-  [36][37][38]. As vital components of the immune system, Neutrophils and lymphocytes play a considerable role in host defense and clearance of infections. In the blood, fewer lymphocyte counts may be an important factor in disease severity and mortality in COVID-19 [39]. Ruan et al. and Chen et al. showed that if factors such as the number of lymphocytes, neutrophils, monocytes, and platelets were out of the normal range, it would indicate virus replication and inflammation in COVID-19 patients [40,41]. According to studies on the factors affecting the severity of COVID-19 disease, the number of neutrophils in patients with high disease severity was higher than patients with moderate disease severity [42]. In this study, the mortality risk of COVID-19 patients increased with increasing neutrophil count and with decreasing lymphocyte count.
There are several methods to select important features. Based on this method, ten features BUN, Lym, age, BS, SGOT, Mono, CR, NUT, Alp, HCT, respectively, were the most important effective features in the probability of COVID-19 death were selected whose their relative importance was more than 4%. In the study of Ma et al., Lym, age, NUT, Mono [43], in Aljame et al., Lym, NUT, age [44], and in Subudhi et al., Lym, NUT, BS, and age [45] due to the most important risk factors in COVID-19 death was introduced. Also, the results showed that the relative importance of the selected features by removing missing values or imputation did not change much in terms of the order of relative importance.
Many studies, such as the present study, have predicted mortality from COVID-19 using supervised machine learning techniques. The results of predictive performance comparisons showed overlay GNAM had the best performance with the test accuracy with mean accuracy, F1-score and mean AUC in the test dataset of 0.847, 0.691, and 0.774 respectively.
Li et al. considered six important biomarkers (D-dimer, blood oxygen, Lym to NUT ratio, C-reactive protein (CRP), and lactate dehydrogenase) using the DNN model to predict the mortality of COVID-19 patients. Their model AUC was 0.95 [46]. In a study by Lin et al. of 30 demographic and laboratory biomarkers using machine learning methods including neural network to predict mortality of COVID-19 patients, the accuracy and AUC of this model were 0.91 and 0.88, respectively [47]. In the study of Morales et al., ten important demographic and laboratory biomarkers were used, including age, blood pressure, liver, and kidney failure. The accuracy of predicting the death of COVID-19 patients using the neural network model was 0.88 [48].
An important advantage of the GNAM over other neural networks, including DNN (black box), is its interpretability, which is based on the smooth function graphs of each feature learned from the GNAM. In this regard, instead of inflexible parametric assumptions, the relationship between the output and the input feature is expressed by a smoothing function that can be applied to virtually any form of data.
Considering the 3% of missing values, similar results were obtained in different scenarios of handling missing data. The GNAM model compared to other models had a higher predictive performance. Therefore, the interpretable smooth function graphs related to the GNAM model were drawn and reported for the nonmissing data. The smooth functions learned from fifty fitted GNAMs are visualized in Fig. 5 to interpret the behavior of each feature. With increasing the value of the features age, BUN, BS, CR, SGOT, ALP, HCT, Mono, and NUT, the smooth function of the logarithm odds of COVID-19 mortality in these features increased. However, with increasing the value of Lym, the smooth function of the logarithm odds decreases.
Since the critical time for disease progression has been reported 10 to 14 days from the onset of clinical symptoms, identifying the factors affecting patient mortality from hospitalization makes it possible for decision-making and predictive power for physicians [41]. Also, the use of interpretable machine learning models such as GNAM, compared to common blackbox neural networks, can provide physicians with a broad view of changes resulting from effective variables that reduce or increase the risk of patient mortality. As a result, they increase the accuracy of predicting patient mortality.

Limitations
The samples used in this study are limited to the central city of Hamadan, Hamadan province, Iran, so the number of generalizable samples was less. To identify the features affecting patient mortality, some patient information such as BMI, predisposing factors, or underlying comorbidities due to high missing did not use. These features may also play an efficient role in predicting patient mortality.

Conclusion
When the relationships between predictor features and response are nonlinear, machine learning models such as GNAM can perform well in predicting the mortality of COVID-19 patients. Therefore, the use of interpretable machine learning models such as GNAM helps physicians to prioritize some important demographic factors and laboratory findings by identifying the effective features and the type of predictive trend in disease progression.